compound character
BanglaNet: Bangla Handwritten Character Recognition using Ensembling of Convolutional Neural Network
Saha, Chandrika, Rahman, Md Mostafijur
Handwritten character recognition is a crucial task because of its abundant applications. The recognition task of Bangla handwritten characters is especially challenging because of the cursive nature of Bangla characters and the presence of compound characters with more than one way of writing. In this paper, a classification model based on the ensembling of several Convolutional Neural Networks (CNN), namely, BanglaNet is proposed to classify Bangla basic characters, compound characters, numerals, and modifiers. Three different models based on the idea of state-of-the-art CNN models like Inception, ResNet, and DenseNet have been trained with both augmented and non-augmented inputs. Finally, all these models are averaged or ensembled to get the finishing model. Rigorous experimentation on three benchmark Bangla handwritten characters datasets, namely, CMATERdb, BanglaLekha-Isolated, and Ekush has exhibited significant recognition accuracies compared to some recent CNN-based research. The top-1 recognition accuracies obtained are 98.40%, 97.65%, and 97.32%, and the top-3 accuracies are 99.79%, 99.74%, and 99.56% for CMATERdb, BanglaLekha-Isolated, and Ekush datasets respectively.
MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters
Ferdous, Jannatul, Karmaker, Suvrajit, Rabby, A K M Shahariar Azad, Hossain, Syed Akhter
At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.
Segmentation of Offline Handwritten Bengali Script
Basu, Subhadip, Chaudhuri, Chitrita, Kundu, Mahantapas, Nasipuri, Mita, Basu, Dipak K.
Character segmentation is one of the most important decision processes for optical character recognition (OCR). Isolating individual alphabetic characters in the script image is often significant enough to make a decisive contribution towards the success rate of the overall system. An OCR system may be designed to work for either of online and off-line purposes. Online OCR systems collect input data by recording the order of strokes made by the write on an electronic bit-pad, and off-line OCR systems do the same by recording the pixel by pixel digital image of the entire writing with a digital scanner. OCR has a wide field of application covering handwritten document transcription, automatic mail address recognition, machine processing of bankchecks, faxes etc. Off-line OCR of hand written words has long been an active area research. Some important contributions so far made in this field involve analysis of English texts [1], [2], [3], [5], Chinese script [6] and Arabic characters [9]. With this background of research, the present work considers Bengali script for developing suitable techniques for off-line OCR with it.